[SOUND] In this lecture,
we're going to talk about text access.
In the previously lecture, we talked
about natural language content analysis.
We explained that the state of the art
natural language processing techniques
are still not good enough to process
a lot of unrestricted text data
in a robust manner.
As a result, bag of words representation
remains very popular in
applications like search engines.
In this lecture we're going to talk
about some high level strategies
to help users get access to the text data.
This is also important step to
convert raw, big text data into small
relevant data that are actually
needed in a specific application.
So the main question we address here is,
how can a text information system
help users get access to
the relevant text data?
We're going to cover two complementary
strategies, push vs pull.
And then we're going to talk about
two ways to implement the pull mode,
querying vs browsing.
So first, push vs pull.
These are two different ways to connect
users with the right information
at the right time.
The difference is which
takes the initiative,
which party it takes in the initiative.
In the pull mode,
the users would take the initiative to
start the information access process.
And in this case, a user typically would
use a search engine to fulfill the goal.
For example,
the user may type in a query, and
then browse results to find
the relevant information.
So this is usually appropriate for
satisfying a user's ad
hoc information need.
An ad hoc information need is
a temporary information need.
For example, you want to buy a product so
you suddenly have a need to read
reviews about related products.
But after you have collected information,
you have purchased your product, you
generally no longer need such information.
So it's a temporary information need.
In such a case, it's very hard for
a system to predict your need, and
it's more appropriate for
the users to take the initiative.
And that's why search engines
are very useful today,
because many people have many ad
hoc information needs all the time.
So as we are speaking Google probably is
processing many queries from this, and
those are all, or
mostly ad hoc information needs.
So this is a pull mode.
In contrast, in the push mode
the system will take the initiative
to push the information to the user or
to recommend the information to the user.
So in this case, this is usually
supported by a recommender system.
Now this would be appropriate if
the user has a stable information need.
For example, you may have a research
interest in some topic, and
that interest tends to stay for
a while, so it's relatively stable.
Your hobby is another example
of a stable information need.
In such a case, the system can interact
with you and can learn your interest, and
then can monitor the information stream.
If it is, the system hasn't seen any
relevant items to your interest,
the system could then take the initiative
to recommend information to you.
So for example, a news filter or
news recommender system could
monitor the news stream and
identify interest in news to you, and
simply push the news articles to you.
This mode of information access
may be also appropriate when
the system has a good knowledge
about the user's need.
And this happens in the search context.
So for example, when you search for
information on the web a search
engine might infer you might be also
interested in some related information.
And they would recommend
the information to you.
So that should remind you for example,
advertisement placed on a search page.
So this is about the, the two high level
strategies or two modes of text access.
Now let's look at the pull
mode in more detail.
In the pull mode, we can further this
in usually two ways to help users,
querying vs browsing.
In querying, a user would just enter
a query, typically a keyword query, and
the search engine system would
return relevant documents to users.
And this works well when
the user knows what exactly key,
are the keywords to be used.
So if you know exactly
what you're looking for
you tend to know the right keywords,
and then query would work very well.
And we do that all the time.
But we also know that
sometimes it doesn't work so
well, when you don't know the right
keywords to use in the query or
you want to browse information
in some topic area.
In this case browsing
would be more useful.
So in this case in the case of browsing
the users would simply navigate
into the relevant information
by following the path that's
supported by the structures
on the documents.
So the system would maintain
some kind of structures, and
then the user could follow
these structures to navigate.
So this strategy works well when the user
wants to explore information space or
the user doesn't know what
are the keywords to use in the query.
Or simply because the user, finds it
inconvenient to type in the query.
So even if a user knows what query to type
in, if the user is using a cell phone
to search for information,
then it's still hard to enter the query.
In such a case again,
browsing tends to be more convenient.
The relationship between browsing and
the query is best understood by
making an analogy to sightseeing.
Imagine if you are touring a city.
Now if you know the exact address
of a attraction, then taking a taxi
there is perhaps the fastest way,
you can go directly to the site.
But if you don't know the exact address,
you may need to walk around, or
you can take a taxi to a nearby place,
and then walk around.
It turns out that we do exactly
the same in the information space.
If you know exactly what you
are looking for, then you can
use the right keywords in your query
to find the information directly.
That's usually the fastest
way to do find information.
But what if you don't know
the exact keywords to use?
Well, your query probably won't work so
well, and you will land on some related
pages, and then you need to also walk
around in the information space.
Meaning by following the links or
by browsing,
you can then finally get
into the relevant page.
If you want to learn about a topic again,
you you will likely do a lot of browsing.
So just like you are looking
around in some area and
you want to see some interesting
attractions in a related
in the same region.
So this analogy also tells us that
today we have very good support for
querying, but we don't really
have good support for browsing.
And this is because in order to browse
effectively, we need a a map to guide us,
just like you need a map of Chicago
to tour the city of Chicago.
You need a topical map to
tour the information space.
So how to construct such a topical
map is in fact a very interesting
research question that
likely will bring us
more interesting browsing experience
on the web or in other applications.
So to summarize this lecture, we have
talked about two high level strategies for
text access, push and pull.
Push tends to be supported
by a recommender system and
pull tends to be supported
by a search engine.
Of course in the sophisticated
intent in the information system,
we should combine the two.
In the pull mode we have further
distinguished querying and browsing.
Again, we generally want to combine
the two ways to help users so
that you can support both querying and
browsing.
If you want to know more about
the relationship between pull and
push, you can read this article.
This gives a excellent discussion of the
relationship between information filtering
and information retrieval.
Here information filtering is similar
to information recommendation,
or the push mode of information access.
[MUSIC]

